Members
Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: Partnerships and Cooperations

National initiatives

Equipex ORTOLANG

ANR ARTIS

This contract started in January 2009 in collaboration with LTCI (Paris), Gipsa-Lab (Grenoble) and IRIT (Toulouse). Its main purpose is the acoustic-to-articulatory inversion of speech signals. Unlike the European project ASPI the approach followed in our group will focus on the use of standard spectra input data, i.e. cepstral vectors. The objective of the project is to develop a demonstrator enabling inversion of speech signals in the domain of second language learning.

This year the work has focused on the development of the inversion from cepstral data as input. We particularly worked on the comparison of cepstral vectors calculated on natural speech and those obtained via the articulatory to acoustic mapping. Bilinear frequency warping was combined with affine adaptation of cepstral coefficients. These two adaptation strategies enable a very good recovery of vocal tract shapes from natural speech. The second topic studied is the access to the codebook. Two pruning strategies, a simple one using the spectral peak corresponding to F2 and a more elaborated one exploiting lax dynamic programming applied on spectral peaks enable a very efficient access to the articulatory codebook used for inversion.

This year, the project focused on the articulatory synthesis in order to generate better sequences of consonant/vowel/consonant by developing time patterns coordinating source and vocal tract dynamics.

ANR ViSAC

This contract started in January 2009 in collaboration with Magrit Inria team. The purpose of this project is to develop synthesis techniques where speech is considered as a bimodal signal with its acoustic and visual components that are considered simultaneously. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. This final year of the project, we have performed an extensive evaluation of the synthesis system using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels [22] .

ANR ORFEO

In this project, we have provided an automatic alignment at the word and phoneme levels for audio files from the corpus TCOF (Traitement de Corpus Oraux en Français). This corpus contains mainly spontaneous speech, recorded under various conditions with a large SNR range and a lot of overlapping speech. We tested different acoustic models and different adaptation methods for the forced alignment.

ANR-DFG IFCASL

The work has mainly focused on the design of a corpus of French sentences and text that will be recorded by German speakers learning French, recoding a corpus of German sentences read by French speakers, and tools for annotating French and German corpora. Beforehand, two preliminary small corpora have been designed and recorded in order to bring to the fore the most interesting phonetic issues to be investigated in the project. In addition this preliminary work was used to test the recording devices so as to guarantee the same quality of recording in Saarbrücken and in Nancy, and to design and develop recording software.

In this project, we also provided an automatic alignment procedure at the word and phoneme levels for 4 corpora: French sentences uttered by French speakers, French sentences uttered by German speakers, German sentences uttered by French speakers, German sentences uttered by German speakers.

ANR ContNomina

FUI RAPSODIE

ADT FASST

The Action de Développement Technologique Inria (ADT) FASST (2012–2014) is conducted by PAROLE in collaboration with the teams PANAMA and TEXMEX of Inria Rennes. It aims to reimplemented into efficient C++ code the Flexible Audio Source Separation Toolbox (FASST) originally developed in Matlab by A. Ozerov, E. Vincent and F. Bimbot in the METISS team of Inria Rennes. This will enable the application of FASST on larger data sets, and its use by a larger audience. The new C++ version will be released early 2014. The second year of the project will be devoted to the integration of FASST with speech recognition software in order to perform noise robust speech recognition.

ADT VisArtico

The technological Development Action (ADT) Inria Visartico just started this November (11/2013 - 10/2015). The purpose of this project is to develop and improve VisArtico, an articulatory vizualisation software. In addition to improve the basic functionalities, several articulatory analysis and processing will be integrated. We will also work on the integration of multimodal data.